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Although changes in chromatin are integral to transcriptional reprogramming during cellular differentiation, it is 
currently unclear how chromatin modifications are targeted to specific loci. To systematically identify transcription 
factors (TFs) that can direct chromatin changes during cell fate decisions, we model the relationship between genome- 
wide dynamics of chromatin marks and the local occurrence of computationally predicted TF binding sites. By applying 
this computational approach to a time course of Polycomb-mediated H3K27me3 marks during neuronal differentia- 
tion of murine stem cells, we identify several motifs that likely regulate the dynamics of this chromatin mark. Among 
these, the sites bound by REST and by the SNAIL family of TFs are predicted to transiently recruit H3K27me3 in 
neuronal progenitors. We validate these predictions experimentally and show that absence of REST indeed causes loss 
of H3K27me3 at target promoters in trans, specifically at the neuronal progenitor state. Moreover, using targeted 
transgenic insertion, we show that promoter fragments containing REST or SNAIL binding sites are sufficient to recruit 
H3K27me3 in cis, while deletion of these sites results in loss of H3K27me3. These findings illustrate that the occurrence 
of TF binding sites can determine chromatin dynamics. Local determination of Polycomb activity by REST and SNAIL 
motifs exemplifies such TF based regulation of chromatin. Furthermore, our results show that key TFs can be identified 
ab initio through computational modeling of epigenome data sets using a modeling approach that we make readily 
accessible. 



[Supplemental material is available for this article.] 

Cellular differentiation entails organized changes in gene expres- 
sion. Pluripotent stem cells that commit to a somatic fate have to 
stably repress pluripotency genes and activate lineage specific genes 
in a temporally correct fashion. This regulation is coordinated by 
transcription factors (TFs) in concert with dynamic changes in 
local chromatin organization of the DNA template. These changes 
have recently been documented in genome-wide analyses of his- 
tone modifications and DNA methylation (Mikkelsen et al. 2007; 
Meissner et al. 2008; Mohn et al. 2008; Zhou et al. 2011). Together 
with genetic studies, epigenome maps have helped to establish the 
relevance of specific reprogramming of chromatin during differ- 
entiation. While several large international efforts to gather epi- 
genome data have been launched (Satterlee et al. 2010; Abbott 
201 1), only limited tools exist to determine the regulatory circuitry 
that guides chromatin dynamics. 

Chromatin modifications can act upstream of TF activity by 
inhibiting or enhancing their ability to bind their cognate sites in 
the DNA (Barrera and Ren 2006; Kouzarides 2007). In turn, TFs can 
also act upstream of chromatin modifications by recruiting chro- 
matin modifying enzymes, that modify the epigenome (Chan and 
La Thangue 2001; Lee et al. 2005). It is this latter mechanism that 
we wish to investigate here in a systematic manner. Since mam- 
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malian genomes encode an estimated 1500-2000 TFs (Vaquerizas 
et al. 2009), a comprehensive experimental investigation of all TFs 
is precluded, and other strategies are thus required to identify 
candidate TFs that are involved in particular aspects of chromatin 
regulation. To address this need, we adapted our recently published 
MARA (motif activity response analysis) approach, which models 
gene expression dynamics in terms of predicted transcription 
factor binding sites (TFBS) (FANTOM Consortium and Riken 
Omics Science Center 2009), to, instead, model genome-wide 
measured chromatin dynamics. The resulting Epi-MARA (Epigenome- 
motif activity response analysis) provides an analytical approach 
to identify TFs associated with chromatin reorganization ab initio, 
which we have made directly accessible through a web server 
(http://ismara.unibas.ch). 

Here we use this approach to identify TFs that are involved in 
dynamic changes of a chromatin modification set by the Poly- 
comb system, arguably the most relevant gene repression system 
during development (Schuettengruber and Cavalli 2009; Simon 
and Kingston 2009; Beisel and Paro 201 1; Margueron and Reinberg 
2011). A central component of Polycomb-mediated silencing is 
trimethylation of lysine 27 on histone H3 (H3K27me3), which is 
set by the Polycomb repressive complex 2 (PRC 2). The protein 
enhancer of zeste homolog 2 (EZH2) catalyzes the methylation of 
H3K27 as part of PRC2 (Czermin et al. 2002; Muller et al. 2002) and 
it is required for differentiation and reprogramming (O'Carroll 
et al. 2001; Pereira et al. 2010). Target genes include important 
developmental regulators in mouse embryonic stem (ES) cells (Boyer 
et al. 2006) and are in part cell-type specific (Bracken et al. 2006; 
Mikkelsen et al. 2007; Mohn et al. 2008). Although DNA binding 
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factors with limited sequence specificity have been implicated in 
targeting of the Poly comb system in flies (Ringrose and Paro 2007; 
Schwartz and Pinotta 2008), the question of how Polycomb targets 
are specified remains currently unresolved, especially in vertebrates 
(Simon and Kingston 2009; Beisel and Paro 2011). We applied Epi- 
MARA on three developmental time points, where we measured the 
H3K27me3 chromatin mark during in vitro neurogenesis starting 
from murine stem cells (Mohn et al. 2008), resulting in a number of 
putative TFs involved in regulating different aspects of this chro- 
matin reorganization. Among the top predictions were that binding 
sites for the SNAIL family of TFs and for sites of the TF REl-silencing 
transcription factor (REST) are associated with a transient increase in 
H3K27me3 at promoters in neuronal progenitors (NPs). We experi- 
mentally validate the role of REST using genome-wide mapping of 
REST binding and H3K27me3 levels in both wild-type and REST- 
knockout cells. Furthermore, using transgenic constructs we show 
that promoter fragments containing REST or SNAIL binding sites are 
sufficient to recruit H3K27me3 while fragments in which these sites 
are deleted show reduced H3K27me3 levels. These results provide 
clear examples in vertebrates of TFs that locally influence Polycomb 
activity and, more generally, illustrate that TFs with regulatory 
function for epigenome reprogramming can be identified ab initio 
using computational modeling. 

Results 

Predicting mediators of chromatin changes using Epi-MARA 

Comprehensive mapping of promoter regions (Harbers and 
Carninci 2005; de Hoon and Hayashizaki 2008; Balwierz et al. 
2009) combined with comparative genomic prediction of TFBSs 
(van Nimwegen 2007) for known TF binding motifs (Wasserman 
and Sandelin 2004) have enabled researchers to study to what 
extent local occurrence of TFBSs can explain patterns of gene ex- 
pression (Beer and Tavazoie 2004; Gao et al. 2004; Das et al. 2006; 
FANTOM Consortium and Riken Omics Science Center 2009). 
Here, we ask to what extent dynamic changes in chromatin can be 
explained by local TFBS occurrence and aim to identify TFs that are 
involved in modulating chromatin locally. 

To address this question systematically, we adapted our re- 
cently developed MARA (FANTOM Consortium and Riken Omics 
Science Center 2009), which models mRNA expression dynamics 
in terms of predicted TFBSs, to model genome-wide patterns of 
epigenetic marks, and termed this approach Epi-MARA (Fig. 1). 
Concretely, if M ps quantifies the amount of a particular epigenetic 
mark M at promoter p in sample s, and N pm denotes the total 
number of predicted binding sites for regulatory motif m in pro- 
moter p, then we assume a linear model of the following form: 

M ps = noise + C p + £ m N pm A ms , ( 1 ) 

where c p is the basal level of the chromatin mark at promoter p, and 
A ms is the unknown activity of motif m in sample s, which is 
inferred by Epi-MARA (see Methods). Abstractly speaking, the ac- 
tivity A ms quantifies how much each occurrence of motif m con- 
tributes to the level of epigenetic mark M in sample s. One can 
think of A ms as reflecting the amount of TF binding to sites of motif 
m and the resulting effect on chromatin mark M. Thus, whenever 
Epi-MARA infers a highly positive activity A ms , this predicts that 
the binding TF recruits the chromatin mark at stage s, whereas 
a highly negative A ms implies that the binding TF inhibits de- 
position of the mark at stage s. Notably, since the average level at 
each promoter p is fitted by the parameter c p , the activities A ms 



capture the relative contributions of a motif across the different 
stages and will thus include both positive and negative activities. 

Notably, it is not the aim of Epi-MARA to provide accurate fits 
of epigenetic profiles at individual promoters. Since the actual 
levels of a chromatin mark at any promoter are likely a complex 
function of many variables acting both in cis and in trans, the 
simple linear model of Equation 1 typically captures only part of 
the variance in epigenetic mark levels. Importantly, however, the 
motif activities are inferred from the combined statistics of the 
hundreds to thousands of promoters that contain a site for a given 
motif. Thus, the linear model applied by Epi-MARA effectively 
averages out the complications at individual promoters, and the 
remaining signal provides a robust statistical average activity for 
each motif, enabling reliable prediction of the TFs involved in 
chromatin mark dynamics. To allow easy application of this 
method, we have made automated Epi-MARA analysis available 
online (http : / / ismara .unibas . ch) . 

As a biological model of dynamic changes of transcriptome 
and epigenome, we used a well-characterized mouse differentia- 
tion system, which progresses from ES cells to terminal neurons 
(TNs) through a defined NP state (Bibel et al. 2004, 2007; Plachta 
et al. 2004). We set out to identify the possible role of TFs in cell- 
type specific targeting of Polycomb-mediated H3K27 trimethyla- 
tion in this system and applied Epi-MARA to our data set of 
H3K27me3 at promoters in the ES, NP, and TN stages (Mohn et al. 

2008) . The general approach is shown in Figure 1 together with the 
predicted activities of the nine motifs that contributed most to 
explaining the genome-wide H3K27me3 dynamics at promoters as 
indicated by a z-statistic (see Supplemental Fig. 1 for a more ex- 
tensive list). Five of these nine, i.e., SP1, Snail (SNAI1), ZEB1, REST, 
and ARNT/AHR, show a pattern in which there is a strong transient 
increase in motif activity at the NP stage. That is, Epi-MARA pre- 
dicts the TFs binding these motifs to be involved in the re- 
cruitment of H3K27me3 going from the ES to the NP stage. Of 
these candidate TFs we chose REST as a target for in-depth exper- 
imental validation as it is the only one of these motifs that is likely 
bound by a single TF and thus highly suitable for functional testing 
by genetic deletion. In contrast, Snail, ZEB1, and SP1 motifs can 
each be recognized by multiple TFs (Postigo and Dean 2000; 
Bouwman and Philipsen 2002; Nieto 2002). 

To compare the activity of TFs in regulating chromatin dy- 
namics with their activities regulating expression we also analyzed 
transcriptome data of the three consecutive stages using the MARA 
method (FANTOM Consortium and Riken Omics Science Center 

2009) . One of the motifs that, according to the MARA analysis, 
most significantly regulate expression changes is the E2F motif 
(Supplemental Fig. 2). Its inferred transcriptional activity is highly 
positive in the ES and NP stages where cells are proliferating, while 
it strongly decreases at the TN stage where cells are post-mitotic 
and have exited the cell cycle. This is consistent with the known 
function of the E2F family of cell-cycle regulators that bind to this 
motif (Tao et al. 1997). In contrast, Epi-MARA predicts no signifi- 
cant activity on H3K27me3 dynamics for the E2F motif. 

Interestingly, REST is also inferred to have an important role 
in driving expression changes, and its activity profile is consistent 
with its known role as a repressor of neuronal genes in non- 
neuronal tissues (Schoenherr and Anderson 1995). That is, REST 
target genes become active at the TN stage where REST itself is 
down-regulated (Supplemental Fig. 3 A). However, the activity 
profile of REST directing expression changes (Supplemental Fig. 2) 
is clearly distinct from its activity profile directing H3K27me3 
(Fig. 1), suggesting that REST's effects on transcription levels are at 
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least partially independent from its effects 
on H3K27me3 levels. Notably, we find that 
predicted REST sites have higher than aver- 
age H3K27me3 levels at all three time points 
in line with previous observation in in- 
dividual cell states (Supplemental Table 1; 
Zheng et al. 2009; Liu et al. 2010). We note 
that two recent studies, which appeared well 
past our decision to functionally validate the 
role of REST, reported biochemical interaction 
between REST and members of the Polycomb 
group (Ren and Kerppola 201 1; Dietrich et al. 
2012). However, these observations of a gen- 
eral co-occurrence of REST and Polycomb do 
not predict the cell-type specific activity for 
REST, which depends on the analysis of dy- 
namic changes in H3K27me3 levels across the 
time course. 

Experimentally determined REST binding 
sites support the computational 
prediction 

To ask whether Epi-MARA's activity pre- 
diction, which is based on computationally 
predicted REST sites, is confirmed by REST 
binding sites that are indeed occupied by the 
factor, we mapped REST binding at the ES 
and NP stages. We carried out chromatin 
immunoprecipitation (ChIP) of REST bound 
DNA and subjected the precipitated DNA to 
high throughput sequencing (ChlP-seq). 
Peak finding was done on pooled replicates 
and revealed 1599 REST binding peaks in ES 
cells and 1035 in progenitors. Identified 
binding sites show a large overlap to those 
previously reported (Supplemental Fig. 4; 
Johnson et al. 2008). The reduced number of 
peaks in progenitors likely reflects the fact 
that REST protein levels decrease during 
neuronal differentiation (Supplemental Fig. 
3A). In agreement with this hypothesis 97% 
of the peaks present in progenitors are also 
present in stem cells. The majority of REST 
peaks contain a predicted binding site (Sup- 
plemental Table 2), and the number of pre- 
dicted sites and the amount of binding as 
assayed by ChlP-seq correlate positively (r = 
0.48, P-value 2.9 x 10" 53 ). As we recently 
showed (Arnold et al. 2012), the rate of false 
positive predicted REST sites is generally low. 
Of the 24,004 promoters without evidence of 
REST binding, only 164 (0.7%) were predicted 
to harbor a REST binding site. REST binding 
occurs preferentially in proximity to tran- 
scription start sites (TSS) (Supplemental Fig. 
3B) and we classified genes with REST binding 
within ±2 kb of the TSS as potentially regu- 
lated by this factor (Supplemental Fig. 3C). 
Mammalian promoters separate into two 
classes associated with either high or low 
density of CpG dinucleotides (Bird 1986; 
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Carninci et al. 2006; Balwierz et al. 2009), 
and we observe that REST predominantly 
targets high-CpG promoters (Supplemen- 
tal Table 3). Interestingly promoter-prox- 
imal REST binding sites show a distinct 
positioning immediately downstream 
from TSS (Sun et al. 2005; Zhang et al. 
2006), which we also observe for both 
predicted and measured REST binding 
(Fig. 2A). While there is general agree- 
ment between predicted and measured 
REST binding, not all predicted pro- 
moter sites are occupied and some of the 
promoter-proximal REST peaks lay just 
outside the regions covered by the com- 
putational predictions. We therefore asked 

whether Epi-MARA predicts different activities for REST if we replace 
the computationally predicted REST sites with the actual binding 
data (see Methods). This analysis resulted in a strikingly identical 
activity profile for REST, but with much larger significance as the 
z-value almost doubled (Fig. 2B). These results not only support 
the REST activity profile inferred using the TFBS predictions but 
also illustrate how actual in vivo binding data can be incorporated, 
which, in this case, increased the accuracy of Epi-MARA's inference. 

REST binding is associated with H3K27me3 dynamics 
genome-wide 

Next, we assessed H3K27me3 dynamics beyond promoter regions 
by performing ChlP-seq at the three differentiation stages and 
determined all genomic regions that were enriched for H3K27me3 
in at least one of the cellular states (see Methods). First, we noted 
that H3K27me3 levels peak immediately downstream from the TSS 
very similar to the binding pattern of REST (Supplemental Fig. 5 A). 
Moreover, H3K27me3 levels peak around REST binding sites, sug- 
gesting that the TF and chromatin mark colocalize (Supplemental 
Fig. 5B). If REST is mediating deposition of H3K27me3 by recruit- 
ment of Polycomb, we would expect REST binding to also colocalize 
with members of the PRC2 complex. To test this we analyzed SUZ12 
ChlP-seq binding data from mouse ES cells (Pasini et al. 2010) and 
NPs around REST binding sites. Importantly, we find that SUZ12 is 
even more localized at REST sites than H3K27me3 (Supplemental 
Fig. 5B). 

Although many H3K27me3 enriched regions occur proximal 
to promoters, more than two-thirds of H3K27me3 enriched re- 
gions are distal to promoters. However, these distal H3K27me3 
regions are much less likely to be targeted by REST than promoter- 
proximal regions (Supplemental Table 4). Given REST's preferred 
targeting to high-CpG promoters, we investigated the CpG con- 
tent of all H3K27me3 regions and found that, strikingly, these 
separate into high- and low-CpG classes, similar to promoters 



Figure 1 . Epi-MARA's approach to predicting transcription factor activities that explain dynamics in 
H3K27me3 levels during neuronal differentiation. Transcription factor binding sites were predicted in 
proximal promoters genome-wide, using a Bayesian method that explicitly models binding site 
evolution. Epi-MARA models measured chromatin dynamics in terms of predicted TFBSs. M ps quan- 
tifies the amount of a particular epigenetic mark M at promoter p in sample s, N pm denotes the total 
number of predicted binding sites for regulatory motif m in promoter p, c p indicates the basal level of 
the mark at promoter p, and A ms is the unknown activity of motif m in sample s, which is inferred by the 
method. Depicted are the normalized activity profiles of the top nine motifs (green lines, with standard 
errors indicated) with their respective z-values. The three time points correspond to the embryonic 
stem cell (ES), neuronal progenitor (NP), and terminal neuron (TN) stage. (Insets) Sequence logos of 
each of the motifs and the transcription factors thought to bind to them are shown. 
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Figure 2. Analysis of REST binding data supports computational predictions. (A) Frequency of pre- 
dicted (green line) and measured (blue line) binding sites around TSSs. (B) REST activity profiles cal- 
culated by Epi-MARA are similar when using either computationally predicted (green line) or measured 
REST binding sites (blue line). The prediction has higher significance when using the measured sites as 
indicated by the higher z-value (i.e., higher variance in activity relative to the error bars). 



(Fig. 3A). Moreover, CpG content cleanly distinguishes proximal 
and distal H3K27me3 regions, with 85% of proximal regions be- 
ing high-CpG and 75% of distal regions being low-CpG (Fig. 3A). 
High-CpG regions are further distinct as they show higher levels 
of H3K27me3 than low-CpG regions (Supplemental Fig. 5C). 
Motivated by these differences, we asked whether Epi-MARA 
predicts different motif activities for REST if we analyze high- and 
low-CpG regions separately (see Methods). For high-CpG regions 
Epi-MARA predicts the same general activity profile for REST as 
previously for promoters, but with even higher significance (Fig. 
3B). Strikingly, for low-CpG regions REST's significance is not 
only reduced but the inferred activity is almost "opposite" to that 
of REST on high-CpG regions (Fig. 3B), i.e., with a transient "loss" 
of H3K27me3 at the NP stage. Interestingly, high- and low-CpG 
regions have distinct H3K27me3 dynamics in general and the dy- 
namics observed at REST targets are consistent with Epi-MARA's 
predictions (Fig. 3C,D). 

In summary, genome-wide analysis of H3K27me3 levels pre- 
dicts that REST binding at high-CpG regions, which includes most 
promoter-proximal REST targets, leads to a transient gain in 
H3K27me3 at the NP stage. In addition, a less significant transient 
loss of H3K27me3 at the NP stage for low-CpG regions is also 
predicted by Epi-MARA. We next tested these predictions by ana- 
lyzing cells in which the Rest gene is deleted. 



REST protein is required for local H3K27 methylation levels 

REST is an essential protein for development as knockout mice die 
at embryonic day 11.5 (Chen et al. 1998). However, knockout ES 
cells (RESTko) are viable and show no defects in pluripotency 
(Jorgensen et al. 2009; Yamada et al. 2010), enabling us to test if 
they are competent to undergo neuronal differentiation in our in 
vitro system. Here, RESTko cells formed morphologically normal 
neurons with high efficiency, correct marker protein expression, 
and limited changes in gene expression (Supplemental Figs. 6, 7), 
suggesting that REST is not essential for 

the initial steps of neuronal differentiation 

in vitro. 

Next, we measured genome-wide 
H3K27me3 levels in RESTko cells at the 
stem cell and progenitor stages to in- 
vestigate whether REST's absence affects 
H3K27me3 levels at its target genes. We sep- 
arated all regions enriched for H3K27me3 
at any of the stages into high-CpG versus 
low-CpG and further into REST-target and 
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Figure 3. REST is associated with H3K27me3 dynamics at high- and low-CpG regions genome-wide. (A) The distribution of CpG dinucleotide fre- 
quencies of H3K27me3 regions genome-wide is bimodal and can be fit by a mixture of two log-normal distributions (red and blue lines) corresponding to 
high- and low-CpG regions, respectively. (Inset) Numbers of K27me3 regions that are promoter-proximal and distal for high-CpG and low-CpG regions. 
(B) REST activity profiles on high- (red) and low-CpG regions (blue) as inferred by running Epi-MARA on all H3K27me3 regions genome-wide show 
a transient gain and loss, respectively, at the NP stage. Note that, whereas REST activity on the high-CpG regions is highly significant, on the low-CpG 
regions REST activity has a much weaker significance. (C) Reverse cumulative distributions of changes in H3K27me3 levels at the transition from ES to NP 
stage. We divided regions that were enriched for H3K27me3 into high-CpG/low-CpG (red/blue) and REST-target/nontarget (solid/broken lines) regions. 
At high-CpG regions REST targets tend to gain H3K27me3 going from the ES to the NP stage whereas nontarget regions are equally likely to gain or lose 
H3K27me3. In contrast, most low-CpG regions lose H3K27me3 going to the NP stage, and REST targets tend to lose even more H3K27me3. (D) As in panel 
C but now for the transition from the NP to the TN stage. High-CpG regions generally tend to lose H3K27me3 and REST targets tend to lose even more, 
whereas low-CpG regions tend to gain H3K27me3 and REST targets tend to gain even more. 



nontarget (see Methods). Next, we compared H3K27me3 levels in 
wild-type and RESTko cells between these four classes. This reveals 
little difference between REST target regions and nontarget regions 
at the ES stage (Table 1; Fig. 4B), in line with Epi-MARA's predicted 
REST activity at this stage. In contrast at the NP stage, as exem- 
plified at two loci in Figure 4A, we observe a substantial loss of 
H3K27me3 in the RESTko cells relative to wild-type cells, affecting 
a substantial number of high-CpG REST targets (Table 1; Fig. 4B; 
Supplemental Fig. 8). In addition, although the changes at low- 
CpG regions are much weaker, a notable gain of H3K27me3 is ob- 
served at low-CpG REST targets (Fig. 4B). This experimentally con- 
firms Epi-MARA's predictions for REST at both high- and low-CpG 
regions. We conclude that REST contributes functionally to local 
levels of H3K27me3, which is strongest at high-CpG regions in NPs. 
Next we tested if the observed loss of H3K27me3 is accompanied by 
a loss of PRC2, which mediates the H3K27me3 mark. We compared 
occupancy of the PRC2 component SUZ12 in RESTwt and RESTko 
NPs. This reveals a loss of SUZ12 at a substantial number of high- 
CpG REST targets (Supplemental Fig. 9A) and a loss of colocalization 
of SUZ12 with REST binding (Supplemental Fig. 9B). Moreover, 
compatible with a role for REST in Polycomb recruitment, there is 
a correlation between reduction in SUZ12 levels and reduction in 
K27me3 levels at high-CpG REST targets (Supplemental Fig. 9C). 



REST affects H3K27me3 and expression independently 
at many target genes 

Since REST is an established repressor of gene activity, it is con- 
ceivable that loss of H3K27me3 at proximal REST targets is a direct 



Table 1 . Estimated percentages of REST targets that significantly 
lose/gain H3K27me3 in the RESTko cells, separately at low- and 
high-CpG regions, and separately at the ES and NP stages 





Percentage that 


Percentage that 


Class and Stage 


loses K27me3 


gains K27me3 


ES low-CpG 


4.9% ± 3.2% 


2.9% ± 2.3% 


ES high-CpG 


5.2% ± 2.3% 


0.9% ± 0.8% 


NP low-CpG 


1 .4% ± 4.4% 


1 2.4% ± 4.4% 


NP high-CpG 


21.7% ±2.8% 


0.8% ± 0.7% 



Using as a cutoff targets that change by more than one standard deviation 
(z > 1 and z < -1 for loss and gain, respectively) we conservatively esti- 
mated the fraction of true targets as the percentage of REST targets with 
a z-value larger than the cutoff in addition to the percentage expected by 
chance based on the standard normal distribution. Error bars are based on 
a Bayesian inference procedure (see Methods). Note that the largest per- 
centage of true targets is observed for high-CpG regions losing H3K27e3 at 
the NP stage, followed by low-CpG regions gaining H3K27me3 atthe same 
stage. 
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Figure 4. REST is required for H3K27me3 dynamics in NP cells. (A) ChlP-Seq signal for H3K27me3 and REST in representative genomic regions. Shown 
are H3K27me3 signal in ES cells, NPs of wild-type (WT) and RESTko cells, as well as REST signal in NPs. The top panel exemplifies selective loss of H3K27me3 
atthe REST binding site of the Xkr7 locus, whereas neighboring regions (BC020535) remain unaffected. The /owerpanel shows similar loss of H3K27me3 at 
the Stmn2 locus. Both the Xkr7 and Stmn2 loci are examples of promoter-proximal high-CpG regions. Shown are normalized read densities. The red bars at 
the REST peaks indicate the regions cloned for transgenic experiments. (B) Global comparison of H3K27me3 levels between wild-type and RESTko cells. 
Shown are the normalized distributions (see Methods) of the ratio between H3K27me3 in wild type versus RESTko for nontarget regions (black lines) and 
for either low-CpG (blue lines) or high-CpG (red lines) regions that are REST targets at the ES (left panel) and NP (right panel) stage. (Insets) Estimated 
fractions of REST targets that significantly lose or gain H3K27me3 in the RESTko at high-CpG (red) and low-CpG regions (blue). There are few significantly 
changing targets at the ES stage. At the NP stage a significant fraction of high-CpG targets lose H3K27me3 and a smaller but still significant fraction of low- 
CpG targets gain H3K27me3 in the RESTko cells. 



consequence of transcriptional up-regulation. This would imply 
that all genes with REST-dependent loss of H3K27me3 are tran- 
scriptionally up-regulated in RESTko cells. Although, as expected 
from a known repressive mark, there is a positive correlation be- 
tween H3K27me3 loss and gene expression, this correlation is rather 
weak (r=0.28 inES andr=0.44 inNP, Supplemental Fig. 1 OA). Most 
importantly, one-third of the regions that lose H3K27me3 at the NP 
stage are not significantly transcriptionally up-regulated (Supple- 
mental Fig. 10). We thus conclude that the crosstalk between REST 



and the Polycomb pathway is independent of transcriptional 
changes at a substantial number of REST targets. 

Promoter fragments containing REST or SNAIL binding sites 
locally recruit methylation of H3K27 

Having established that absence of REST protein leads to a decrease 
of H3K27me3 at high-CpG binding sites, we wanted to further ask 
whether fragments of high-CpG promoter regions containing 
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a REST site can recruit H3K27me3, and whether the REST binding 
site contributes to this recruitment. To this end we generated re- 
porter constructs consisting of 1.2-2-kb promoter fragments con- 
taining a REST site, and mutant versions in which the REST site had 
been deleted. To ensure comparable chromatin organization we 
placed these sequence variants in wild-type cells into the same 
chromosomal locus using a Cre-recombinase based targeting sys- 
tem (Feng et al. 1999; Lienert et al. 2011). This site-specific tar- 
geting further enables us to control for genomic environment and 
thus to directly compare wild-type and mutant sequences (Fig. 5 A). 
Importantly the chosen "test site" is positioned within a genomic 
region that harbors no H3K27me3 and no REST binding (Lienert 
et al. 2011; Stadleret al. 2011). Thus, any REST or H3K27me3 signal 
should primarily reflect the recruitment abilities of the inserted 
sequence fragments. We inserted wild-type and mutated (AREST) 
promoter fragments (Fig. 5B) of the following genes: Stmn2, Xkr7, 
Bdnf, and Pgbd5. After targeted insertion and differentiation into 
NPs we detected strong REST binding by ChIP to the wild type, but 
no or weak binding in the four REST mutant sequences, showing 
that the REST site is required for REST binding to the reporter 
constructs (Supplemental Fig. 11). Importantly, H3K27me3 is ob- 
served at all promoter fragments containing the REST site at the 
progenitor stage, whereas the mutant sequences show significant 
loss of H3K27me3 (Fig. 5C). In the case of the stathmin-like 2 
(Stmn2) promoter, presence of the REST site results in a more than 
threefold increase of H3K27me3 signal. Notably, the endogenous 
Stmn2 promoter shows no transcriptional response in RESTko cells. 
Of all four tested promoter fragments the Pgbd5 fragment shows the 
weakest loss of H3K27me3. Notably, the corresponding loss of REST 
binding at this promoter is also the weakest (Supplemental Fig. 11), 
suggesting that a cryptic binding site may still remain at this frag- 
ment. Together with the observed changes in H3K27me3 levels at 
genome-wide REST targets in the RESTko cells these results firmly 
establish that REST binding mediates Polycomb targeting and 
contributes to local levels of H3K27 methylation. 

Besides REST, several motifs that Epi-MARA predicted to play 
a role in H3K27me3 dynamics are recognized by a family of TFs. 
This makes loss of function approaches at the protein level very 
demanding. Our transgenic approach, however, can be used to 
assess the contribution of binding motifs to Polycomb recruitment 
irrespective of which TF from a family is binding. We thus ex- 
tended our analysis to study the effect of the SNAIL binding site, 
another motif predicted to recruit H3K27me3 at the NP stage (Fig. 
1). We inserted a total of six regulatory regions containing wild- 
type or mutated SNAIL sites (Fig. 5D) and tested for presence of 
H3K27 trimethylation. As seen with regulatory regions containing 
REST sites, we observe that all constructs containing SNAIL sites are 
sufficient to recruit H3K27me3. Deletion of the SNAIL sites leads to 
significant reduction of H3K27me3 for two of the three constructs 
tested (Fig. 5E). Notably, the construct that showed no significant 
response was the only one that contained only a single predicted 
SNAIL site, suggesting that the effect on H3K27me3 increases with 
the number of sites. 

In summary, we conclude that promoter fragments contain- 
ing bindings sites for SNAIL and REST TFs are sufficient to recruit 
H3K27me3 and, in line with the predictions, that these binding 
sites contribute in cis to H3K27me3 levels. 

Discussion 

Recent genome-wide analyses of chromatin have established un- 
expected dynamics of the epigenome, which reflect cellular and 



developmental states. The analysis of such data has predominantly 
focused on characterizing the different kinds of chromatin do- 
mains that exist and associating these domains with functional 
features such as active or inactive promoters or distal regulatory 
elements (Suzuki and Bird 2008; Meissner 2010; Ernst et al. 2011; 
Zhou et al. 2011). With the exception of chromatin modifications 
that are set by the process of transcription itself, such as H3K36 
methylation, our understanding of how dynamic changes in 
chromatin are regulated remains limited. This likely reflects the 
complexity of the underlying targeting as different recruitment 
mechanisms for chromatin modifiers have been identified, includ- 
ing TFs, noncoding RNAs, as well as higher order nuclear organi- 
zation (Schuettengruber and Cavalli 2009; Simon and Kingston 
2009; Beisel and Paro 2011). 

Here, we have tested the hypothesis that TFs contribute to 
dynamic changes in chromatin during cellular differentiation. We 
combined mapping of epigenetic marks at consecutive stages with 
computational modeling (Epi-MARA) to predict TFs involved in 
recruiting specific chromatin changes ab initio. We started from 
a data set of murine ES cells undergoing neurogenesis, in which 
levels of H3K27me3 were measured at three consecutive cellular 
states during the differentiation. Application of Epi-MARA to this 
data identified several TFs as potential regulators of Polycomb 
dynamics during differentiation. Using several lines of evidence, 
we experimentally validate the prediction that REST is involved in 
transiently recruiting H3K27me3 to promoter regions at the NP 
stage: (1) Genetic deletion reveals that REST is necessary in trans for 
increased H3K27me3 levels at REST targets at the NP stage, specifi- 
cally at high-CpG target regions, which includes almost all pro- 
moter-proximal target regions. (2) Absence of REST causes loss of the 
PRC2 component SUZ12, mirroring the loss H3K27me3 at high- 
CpG regions. (3) Promoter fragments containing a REST binding site 
are sufficient in cis to recruit H3K27me3, whereas identical regions 
with mutated REST binding sites showed reduced recruitment. 
These findings support a model whereby local REST binding recruits 
Polycomb at the induction of in vitro neurogenesis. 

Previous studies have already noted increased H3K27me3 
signal at REST-bound promoters and enrichment of REST binding 
sites at CpG-islands bound by PRC2 (Ku et al. 2008; Zheng et al. 
2009), while a more recent study in human cells showed that the 
noncoding RNA HOTAIR can bind to PRC2 and the LSDl/CoREST/ 
REST complex in vitro (Tsai et al. 2010). However, the function of 
HOTAIR appears not to be conserved in mouse as its genetic de- 
letion has no phenotypic consequences (Schorderet and Duboule 
2011). During the preparation of this manuscript two studies re- 
ported biochemical interaction between REST and members of 
the PRC1 and PRC2 complexes (Ren and Kerppola 2011; Dietrich 
et al. 2012). Importantly, however, these correlative observations 
at single-cell states did not identify the dynamic and context- 
dependent role of REST on H3K27 trimethylation that we predict 
based on chromatin dynamics and further validate experimentally. 
Notably, we show that absence of REST in stem cells has only subtle 
effects on H3K27me3 levels at target regions, suggesting that this 
previously noted co-occurrence of REST and H3K27me3 in stem 
cells has limited functional relevance. Importantly, and as pre- 
dicted by our computational model, REST has more pronounced 
effects for H3K27me3 levels of target regions in NPs (Fig. 4B). 

While the detailed mechanisms of Polycomb targeting remain 
to be determined, our study suggests that, rather than a single 
dominant factor, it likely involves multiple TFs as we found several 
regulatory motifs associated with the increase of H3K27me3 at the 
NP stage. Of these, ZEB1 and the family of SNAIL factors bind to 
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Figure 5. TFBS are required for H3K27me3 recruitment at the NP stage. (A) Strategy to insert promoter regions into a defined genetic site (beta globin 
locus) via RMCE. The two marker genes inserted into the beta globin locus confer resistance against hygromycin (Hy) and sensitivity against ganciclovir 
(Tk), respectively, and are flanked by two inverted lox sites (black triangles). Targeted insertion of a given transgene is achieved by Cre-mediated re- 
combination and negative selection. (B) The RMCE approach was used to insert several REST target promoter fragments with either wild-type sequence 
(WT) or REST site mutation (AREST) into the beta globin locus. Correctly targeted ES cells were differentiated to the NP stage, where H3K27me3 and REST 
were measured at the inserted fragments. (C) For each of the four inserts H3K27me3 levels were measured in cells bearing the WT fragment (red bars) and 
in cells bearing the AREST fragment (green bars). Levels were measured at, from left to right in each panel, the inserted region, the corresponding 
endogenous locus, a positive control, and a negative control region. Note that different promoter regions are used as positive controls in the different 
panels. All inserted WT fragments show significant recruitment of H3K27me3 and loss in H3K27me3 for the AREST fragments. (D) Either wild-type (WT) or 
mutated (MUT) promoter regions containing predicted SNAIL sites were inserted via RMCE. The SNAIL sites were mutated by changing the first and last 
nucleotide of the motif to a Thymidine. Correctly targeted ES cells were differentiated to the NP stage. (£) For each of the three inserts H3K27me3 levels 
were measured in cells bearing the WT promoter (red bars) and in cells bearing promoters with mutated SNAIL sites (green bars). Note that the Cdhl, 
Usp43, and Esam promoter regions have three, two, and one predicted/mutated SNAIL site, respectively. Levels were measured at, from left to right in each 
panel, the inserted region, the corresponding endogenous locus, a positive control, and a negative control region. All H3K27me3 levels are scaled to that 
of the endogenous region and error bars show the standard error of three biological replicates. A P-value is shown and calculated for each insert using 
unpaired one-tailed t-test statistics. 
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similar motifs and are important transcriptional repressors during 
epithelial-to-mesenchymal transition (Cano et al. 2000; Liu et al. 
2008), which is compatible with a proposed function in Polycomb 
recruitment (Herranz et al. 2008). Here we experimentally confirm 
the role of SNAIL sites in regulating H3K27me3 levels using our 
transgenic approach (Fig. 5). Since SP1 sites are among the most 
commonly occurring regulatory sites within CpG-islands, it is dif- 
ficult to interpret whether the predicted role of SP1 in H3K27me3 
dynamics is specific to SP1 or more generally associated with CpG- 
islands, which have been suggested to recruit PRC2 (Mendenhall 
et al. 2010). It is noteworthy however, that SPl-like sites are a 
component of Polycomb Responsive Elements (PRE) in Drosophila 
(Brown and Kassis 2010). In contrast, YY1, the mammalian ortholog 
of PHO, which is the most established TF with a function in Poly- 
comb recruitment in Drosophila melanogaster, is unlikely to have that 
role in mammals (Ku et al. 2008; Mendenhall et al. 2010), at least in 
stem cells. 

Based on recent work in Drosophila (Enderle et al. 2011) and 
mouse stem cells (Landeira et al. 2010; Brookes et al. 2012), it has 
been suggested that Polycomb might repress by stalling poly- 
merases. Our observation that the REST, Snail, and ZEB1 binding 
sites tend to be positioned immediately downstream from TSS 
(Supplemental Fig. 12) is compatible with this model. However, this 
observation does not generally apply to the top nine predicted 
motifs (Supplemental Fig. 12). We further show that the dynamics of 
H3K27me3 are different for high-CpG and low-CpG regions in line 
with a proposed model that local CpG richness influences Polycomb 
recruitment (Mendenhall et al. 2010; Lynch et al. 2011). However, 
the precise role of CpG dinucleotides in Polycomb recruitment has 
not been uncovered. The different behavior of low- and high-CpG 
regions might be explained by different cofactors that associate with 
REST in a CpG-dependent manner that could affect recruitment 
versus inhibition of the Polycomb machinery such that different 
complexes with distinct activities are formed at high-CpG versus 
low-CpG regions. Our data suggest that this is connected to indi- 
vidual TF activity since REST has the strongest effect on H3K27me3 
levels at high-CpG regions and a weaker opposite effect at low-CpG 
regions (Fig. 4B). This opposite behavior at high-CpG and low-CpG 
motifs does not generalize to all TFs (Supplemental Table 5). 

Our results are compatible with a role for cell-type specific 
cofactors since the effect of REST on H3K27me3 are by far strongest 
at the NP stage, whereas REST binding decreases from the ES to the 
NP stage. Nevertheless, the needed regulatory information can be 
highly localized as tested promoter fragments were sufficient to 
recruit H3K27me3 when inserted into a defined genomic region. 
While it remains to be seen if these elements fulfill the definition 
of a PRE, i.e., whether they repress genes in cis in a Polycomb- 
dependent fashion, our results suggest that both REST and SNAIL 
sites could contribute to such function. Clearly, REST and SNAIL 
provide convincing examples for DNA binding motifs that en- 
hance local Polycomb states in the mammalian genome. 

Epi-MARA provides a general methodology for inferring the 
stage-specific activities of TFs associated with chromatin dynamics 
that we foresee will be useful for the study of epigenome maps 
particularly in light of the multitude of data sets that are being 
generated as part of large epigenome initiatives (Satterlee et al. 2010; 
Abbott 201 1). The approach makes use of sophisticated comparative 
genomic TFBS predictions and linear modeling, which accounts for 
the contributions of all regulatory motifs at once. We note that there 
is only a partial overlap between the most significant motifs iden- 
tified by Epi-MARA and those identified by finding the motifs most 
enriched at regions bearing a particular chromatin mark at each 



stage, because Epi-MARA models the relative chromatin mark across 
a time course and the effects of all motifs are incorporated at once 
(Supplemental Table 6). 

The ability to predict TFs involved in regulating chromatin 
dynamics from epigenome data sets provides a powerful tool in 
this context, as predicted TFs can be immediately subjected to 
follow-up experiments. The identification of the context-dependent 
function of REST and the role of the SNAIL sites illustrate its utility. 
Epi-MARA is directly accessible by our web server implementation 
(http://ismara.unibas.ch). 

Importantly, our findings have direct implications for regu- 
latory models of chromatin regulation. In our neurogenesis sys- 
tem, a linear model in terms of predicted binding sites explains 
roughly the same fraction of variance in H3K27me3 at promoters 
as it explains variance in transcript levels (Supplemental Table 7). 
This result suggests that, like regulation of transcription, chroma- 
tin dynamics of H3K27me3 are regulated to a significant extent 
by local DNA sequence motifs that are recognized by trans-acting 
factors. 

Methods 
Epi-MARA 

We here describe the main methods employed in the Epi-MARA 
analysis. Further details are supplied in the Supplemental Methods. 
Supplemental Figure 13 provides a guide to the reader as it gives an 
overview of the different steps of the analysis. Epi-MARA models the 
dynamics of epigenetic marks in terms of predicted TFBSs 
in regulatory regions genome-wide, building on the motif ac- 
tivity response analysis that we developed previously (FANTOM 
Consortium and Riken Omics Science Center 2009). Briefly, 
for each promoter we constructed multiple alignments using 
orthologous sequences from mouse, human, rhesus macaque, 
dog, cow, horse, and opossum, of the proximal promoter region 
consisting of 500 bp both upstream of and downstream from the 
cluster of TSSs that defines the promoter (Balwierz et al. 2009). 
Using databases of experimentally determined binding sites 
(Wingender et al. 1996; Vlieghe et al. 2006), we collected a set of 
207 mammalian regulatory motifs (position specific weight 
matrices) representing the binding specificities of —350 mam- 
malian TFs. Then, using a Bayesian probabilistic method that ex- 
plicitly models the evolution of TFBSs, we predict binding sites 
for all regulatory motifs in all proximal promoter regions (van 
Nimwegen 2007). We summarize the binding site predictions by a 
matrix with components N pm , denoting the sum of the posterior 
probabilities of all binding sites for motif m in promoter/?, which we 
also refer to as the "number" of binding sites for motif m in pro- 
moter p. The second key ingredient of Epi-MARA is the quantifica- 
tion of epigenetic mark levels across the time course at genomic 
regions of interest. For the analysis of the ChlP-chip data, which 
measured H3K27me3 levels at all promoters genome-wide, we 
quantified the H3K7me3 at a given promoter and time point by the 
average log-intensity of the probes that lie within the promoter. 
For the ChlP-seq analysis we determined H3K27me3 enriched re- 
gions (see below) and found that the majority of H3K27me3 
enriched regions are between 3 and 5 kb in length. For the analy- 
sis of ChlP-seq H3K27me3 levels at promoters we quantify the oc- 
currence of H3K27me3 by the log-fraction of ChlP-seq reads in a 
4-kb region centered on the promoter. For the Epi-MARA analysis of 
genome- wide H3K27me3 enriched regions we use the log-fraction 
of ChlP-seq reads in each region. Note that, because Epi-MARA fits 
the changes of H3K27me3 levels across the time course, Epi-MARA 
results are invariant to an overall rescaling of H3K27me3 levels at 
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each promoter. Finally, to avoid spurious fluctuations in relative 
H3K27me3 levels at promoters with low absolute levels, a pseudo 
read-count corresponding to the average read-count in the back- 
ground sample is added to the read-count in each promoter region. 

We denote the amount of the epigenetic mark M in promoter 
p at time point t by M pt and assume the following linear model: 

M pt = noise + c p + X m ^m^mt, 

where c p is the basal level of the chromatin mark at promoter p, and 
A mt is the unknown activity of motif m at time point t Using 
a Bayesian probabilistic framework, we then calculate a joint pos- 
terior probability distribution for all motif activities. To this end, 
we assume that the deviation between model and measured level 
M pt (i.e., the "noise" term in the above formula) is Gaussian dis- 
tributed at each promoter and at each time point. In addition, to 
avoid over-fitting, we use a Gaussian prior on the activities A mt , and 
we determine the variance of this prior by a cross-validation pro- 
cedure. Finally, we infer both the maximal posterior activities A* mt 
and their standard errors a mt . To rank motifs, we measure the im- 
portance of a motif in explaining expression variations by a score 
similar to a z-statistic. The z-score z m of motif m is quantified as an 
average squared z-value of the activity across conditions, i.e. 

Z m - y j , 

where T is the number of time points. Note that our z-scores are 
meant to rank the importance of motifs and cannot be used to 
assess the statistical significance of motif activities. To assess sta- 
tistical significance of the motif activities that we observed, we 
performed the following permutation test: We randomly permuted 
the association between binding site predictions and promoters 
and reran Epi-MARA 1000 times, reporting the z-scores of the 
inferred motif activities for all 207 motifs in each run. Since in the 
1000 permutations there was only one motif in one run with a 
z-score >2.52, we infer that the probability of obtaining a z-score as 
high as z = 2.52 (the z-score of the REST motif on the ChlP-chip 
data) is approximately P = 5 X 10~ 6 . 

To run Epi-MARA on all H3K27me3 enriched regions ge- 
nome-wide, we predicted TFBSs across the entire 4-kb sequence of 
each H3K27me3 region using the same procedure as used for pre- 
dicting sites in proximal promoters. For each H3K27me3 region we 
then determined the 1-kb window that contains the highest 
number of predicted binding sites (pooling all motifs) and we used 
the predicted sites within this 1-kb region for the entries in the site- 
count matrix N pm for the corresponding H3K27me3 region. To infer 
motif activities separately for high- and low-CpG regions we treat, for 
each motif m, sites within low-CpG regions and sites within high- 
CpG regions as if they derived from two separate motifs, effectively 
doubling the number of motifs for which we infer activities. 

Cell culture 

Wild- type mouse ES cells were derived from blastocysts (3.5 PC) of 
mixed 129-C57B1/6 background and cultivated on feeder cells 
(37°C, 7% C0 2 ). REST knockout and corresponding wild- type cells 
were obtained from Helle Jorgensen (Chen et al. 1998; Jorgensen 
et al. 2009). Differentiation of cells was performed as described 
previously (Bibel et al. 2007; Mohn et al. 2008). 

Western blot analysis 

For detection of REST protein levels during differentiation the total 
cell lysates of wild-type and REST knockout cells were used for 
Western blot analysis. The membrane was probed with mouse anti- 



REST (12C11, gift from David Anderson) and rat anti-tubulin (tis- 
sue culture supernatant, cell line YL1/2, ECACC) in combination 
with appropriate secondary antibodies coupled to HRP. 

Immunocytochemistry 

Cells were fixed with 2% paraformaldehyde, either 3 h or 10 d after 
plating, and probed with mouse anti-PAX6 (chick PAX6 aa 1-223, 
DSHB), rabbit anti-NESTIN (Sigma N5413), and mouse anti-TUJl 
(MMS-435P, Covance). Proteins were detected by an appropriate 
secondary antibody conjugated to Alexa Flour. 

Chromatin-IP 

Cells were cross-linked in medium containing 1% formaldehyde 
for 10 min at room temperature. ChIP was carried out as previously 
described (Koch et al. 2007; Weber et al. 2007) with slight modi- 
fications. Antibodies used were a-H3K27me3 (Millipore, #07-449), 
a-SUZ12 (Cell Signaling Technology, #3737S), and a-REST (Santa 
Cruz, #H-290). Chromatin was sonicated for 15 (stem cells) or 18 
cycles (NPs) of 30 sec using a Diagenode Bioruptor. Precipitated 
DNA was either analyzed by quantitative real time PCR or sub- 
jected to next-generation sequencing. 

Quantitative real time PCR 

Real time PCR was performed using SYBR green chemistry (ABI). 
One-fortieth of ChIP sample or 40 ng of input chromatin were used 
per PCR reaction. Primer sequences are available upon request. All 
data are shown with standard error from three biological replicates. 
Significances were calculated using unpaired one-tailed student's 
t-test statistics. 

Next-generation sequencing 

Five to 10 ng of precipitated DNA was prepared for Solexa Se- 
quencing as described (Mikkelsen et al. 2007). Briefly, ChIP DNA 
was ligated to adapters and ligation products of —250 bp were gel- 
purified on 1.5% agarose to remove unligated adaptors. DNA was 
amplified by 18 PCR cycles. DNA sequencing was carried out using 
the Illumina/Solexa Genome Analyzer II (GA2) sequencing system. 

Genomic coordinates 

The July 2007 Mus musculus genome assembly (NCBI37/mm9) 
provided by NCBI (http://www.ncbi.nlm.nih.gov/genome/guide/ 
mouse/) and the Mouse Genome Sequencing Consortium (http:// 
www.sanger.ac.uk/Projects/M_musculus/) was used as a basis for 
all analyses. Annotation of known RefSeq transcripts was obtained 
from UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/ 
database/refGene.txt.gz). 

Read filtering, alignment, and weighting 

Low-complexity reads were filtered out based on their dinucleotide 
entropy as follows: For each read, the dinucleotide entropy was 
calculated according to the formula H = f { log(fi), where ft is the 
frequency of dinucleotide i in the read and the sum is over all di- 
nucleotides (i from 1 to 16). The read was filtered out if its H was 
less than half the dinucleotide entropy of the genome, typically 
removing <0.5% of the reads in a given sample. Alignments to the 
mouse genome were performed by the software bowtie (version 
0.9.9.1) (Langmead et al. 2009) with parameters -v 2 -a -m 100, 
tracking up to 100 best alignment positions per query and allowing 
at most two mismatches. Each alignment was weighted by the 
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inverse of the number of hits. All quantifications were based on 
weighted alignments. For generation of wiggle files samples were 
normalized for library size first and files were generated with a 
window size of 100 bp. 

Analysis of sequencing data 

In order to detect REST peaks from the ChlP-seq data we slide 
a window of 1000 bp in length along the genome and calculate, for 
each window, the fraction f IP of all ChlP-seq reads from the REST IP 
and the fraction f bg of all reads from a background sample (input 
DNA) that map to the window (since background counts are 
generally smaller, we use a window of 2000 bp centered at the same 
position to obtain more robust background frequencies). Inspect- 
ing the reverse-cumulative distribution of background counts 
across the genome, we observed that a small subset of windows 
showed aberrantly high background frequencies f bg (Supplemental 
Fig. 14B) and these windows were removed from further consid- 
eration (these windows typically correspond to regions with re- 
peats that presumably occur more frequently in the genome of the 
cells from which our DNA was taken than in the mm9 genome 
assembly). We assume that the noise in the estimated f IP and f bg 
follows Poisson distributions and calculate, for each window, a 

z-statistic: z = ^ bg , where N IP and N bg are the total numbers of 

If IP .^g 

reads in the IP and background sample, respectively. Inspecting the 
reverse-cumulative distribution of z-statistics across the genome, 
we observe a long tail of highly enriched regions to the right of z = 
3.1 (Supplemental Fig. 14 A) and we denote all regions with con- 
secutive windows with z- values >3.1 as REST binding regions. To 
determine the false discovery rate of binding region prediction at 
this cutoff we made use of the fact that we measured the back- 
ground distribution in duplicate and performed binding region 
prediction in the exact same way, treating one of the background 
samples as a "foreground" sample. From this we find that, by chance, 
a fraction 1.2 X 10~ 4 windows genome-wide have a z- value >3.1, 
leading to a total of 289 falsely discovered binding regions genome- 
wide, which should be compared with the 1624 REST binding re- 
gions determined from the REST IP sample. Any REST binding 
region whose center is <2 kb from a known TSS (from the RefSeq 
collection of transcripts) is considered proximal. All other REST 
binding regions are considered distal. 

To predict REST binding sites for all REST binding regions we 
again produced multiple alignments of orthologous regions from 
mouse, human, rhesus macaque, dog, cow, horse, and opossum, 
and ran the MotEvo algorithm (van Nimwegen 2007) on each 
multiple alignment. We also searched for noncanonical sites of 
arbitrary spacing between the two half-sites of the REST motif. In 
contrast to previous work (Johnson et al. 2007) which found only 
noncanonical sites with a spacer of 6 bp, we find noncanonical 
sites with spacers of both 6 and 7 bp. Linear regression between the 
total number of predicted REST binding sites (i.e., the sum of 
posteriors of all predicted sites) at each REST binding region and 
the binding z-statistic of the region shows a correlation of r = 0.48 
(P-value 2.9 X 10" 53 ). We compared our predicted REST binding 
regions with those of Johnson et al. (2008) by collecting all regions 
they report and, for each region, calculating a REST fold-change 
enrichment of ChlP-seq reads. We then calculated the overlap of 
the predicted binding regions for a fold-change enrichment of 2 
(Supplemental Fig. 4). 

To obtain positional profiles with respect to TSS for the pre- 
dicted binding sites of REST and other regulatory motifs, we sum- 
med the posteriors of all predicted binding sites at promoters at 
each position relative to TSS. To obtain positional profiles for the 
REST binding data and H3K27me3 signals we simply summed all 



reads from the corresponding IP samples at each position relative 
to TSS. To obtain positional profiles of H3K27me3 and SUZ12 
relative to REST sites we selected all genomic regions that were 
enriched for H3K27me3 and overlapped a REST binding peak. 
For each of these we located the position of the highest scoring 
predicted REST binding site within the peak and then calculated 
the relative frequencies of reads, separately for SUZ12 and H3K27me3, 
in the 3 kb of sequence upstream of and downstream from the REST 
site's position. We finally averaged these relative frequency profiles 
over all REST peaks within H3K27me3 enriched regions. 

To perform Epi-MARA analysis with the REST binding data 
replacing REST binding site predictions we replace the predicted 
binding site counts N pRES T with results of the REST binding assay at 
each promoter p. Since the z-statistics of REST binding at pro- 
moters have a very different distribution of values from those of 
the site counts N pm , it is necessary to normalize the matrix N pm 
such that binding site predictions and binding data can be quan- 
titatively combined. We therefore replaced the matrix N pm with a 
binary matrix B pm in which B pm = 1 whenever N pm > 0.2 and B pm = 0 
otherwise. Finally, we replaced the column B pREST with one based 
on the REST binding data, i.e., where B pRES T= 1 whenever there was 
a REST binding peak within 2 kb of the corresponding promoter, 
and zero otherwise. 

For the analysis of the H3K27me3 data (NCBI Gene Expres- 
sion Omnibus [GEO] [http://www.ncbi.nlm.nih.gov/geo/] acces- 
sion number GSE25533) we slide a 2-kb window along the genome 
and calculate a z-statistic for each window quantifying the en- 
richment of the signal in the IP over the background sample by a 
z-statistic as above, pooling the data from the replicates and from 
the different stages. As for the REST binding data, we observe a long 
tail of high z-values which occurs for the H3K27me3 data to the right 
of z = 4.0 (Supplemental Fig. 14C) and we defined H3K27me3 
enriched regions as consecutive windows that all have a z-statistic 
>4.0. Using again the two background samples (Supplemental Fig. 
14D) to determine a false discovery rate we find that, by chance, 
a fraction of 2.8 X 10~ 4 of windows have a z- value >4.0, leading to 
427 false predicted H3K27me3 regions genome-wide, which 
should be compared with the 18,293 regions predicted from the 
H3K27me3 IP samples. The H3K27me3 enriched regions are di- 
vided into different classes using a number of criteria. Regions that 
overlapped a proximal promoter, i.e., a promoter ±500 bp, were 
considered proximal and all others were considered distal. Simi- 
larly, regions that overlap a REST binding peak were considered 
REST targets and all others nontargets. For each region enriched in 
H3K27me3 we slide a 1-kb window over the region and calculated 
the CpG dinucleotide frequency within each window. We defined 
the CpG content of a region as the highest CpG frequency of a 1-kb 
window within it. Inspection of the distribution of log-CpG content 
across H3K27me3 enriched regions shows two classes, and we fitted 
the distribution of log-CpG content by a mixture of two Gaussians 
(Fig. 3 A). After fitting of the Gaussian mixture, posterior probabilities 
for each region to belong to the high-CpG or low-CpG class were 
calculated in the standard Bayesian way. In subsequent analyses, 
distributions for low-CpG and high-CpG regions were obtained by 
weighing each region with the posterior probability that it belongs to 
the corresponding class. 

For each region that was enriched for H3K27me3 at any of the 
stages, we calculated log-fold changes between ES and NP and 
between NP and TN stages by calculating the log-ratios of the frac- 
tions of reads from the corresponding IP samples mapping to each 
of the regions. 

To compare H3K27me3 levels between wild-type (WT) and 
RESTko mutant (KO) cells we collected all regions that were en- 
riched for H3K27me3 in the wild-type cells at any of the stages. For 
each region we calculated the fractions f WT and f KO of all IP 
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reads that mapped to that region in WT and KO and calculated 
both the absolute intensity X= (log (f WT ) + log (f K0 )) (summed 
over all replicates) as well as the log-ratio: Y - log^jj (averaged 
over the replicates). Supplemental Figure 15 shows, as a function of 
absolute intensity X, the average and standard enor of Y for all re- 
gions that are non-REST targets (black dots with error bars) for both 
high-CpG and low-CpG regions at both the ES and NP stages. As 
these figures make clear, there are some systematic differences in the 
overall distribution of H3K27me3 signals between wild- type and the 
RESTko cells. Therefore, in order to properly compare H3K27me3 
signals between wild type and RESTko, we adopted a normalization 
procedure similar to LOESS normalization. For each stage, we sorted 
all nontarget regions by their absolute intensity X (averaging wild- 
type and RESTko intensities). For each region we then collected the 
50 regions with values of X immediately below, and the 50 regions 
with values of X immediately above, and calculated the mean \± and 
standard deviation a. In this way we estimated the expected mean /x 
and standard deviation a of nontargets, as a function of their ab- 
solute H3K27me3 levels. For each REST target we determined both 
its fold-change Y and absolute H3K27me3 level X and calculated a 
z- value = (Y- /x)/o- using the expected mean and standard deviation 
of nontargets with absolute levels of H3K27me3 of X. To suppress 
fluctuations we averaged the z-statistics with a Gaussian kernel. 
Note that, per definition, the z-values of nontarget regions follow 
a Gaussian distribution of mean zero and standard deviation one. To 
estimate the fraction p of REST targets that significantly change 
H3K27me3 we compared the fraction of REST targets that show 
z-values more than one standard deviation away from the mean 
(i.e., z > 1 when considering targets losing H3K27me3 and z < -1 
when considering targets gaining H3K27me3) with the fraction 
expected by chance using a Bayesian procedure. Let q denote the 
probability to obtain a z- value >1 by chance according to the stan- 
dard Gaussian. Conservatively assuming that all true targets must 
have a z- value >1, the probability for a randomly chosen target to 
have a z- value >1 is P = p + (1 - p)q. Given that there are N REST 
targets in total, of which n have a z-value >1 we use Bayes' theorem 
to calculate a posterior probability distribution over p and estimate 
its mean and standard deviation. We similarly estimate the fraction 
of targets that significantly gain H3K27me3, separately for each 
stage, and separately for high- and low-CpG target regions. To com- 
pare SUZ12 levels in wild-type and RESTko NPs we first, for each 
H3K27me3 enriched region, determined the 1-kb region that had 
the highest overall read-count from the SUZ12 ChlP-seqs. We then 
determined wild- type and knockout SUZ12 levels from these 1-kb 
regions. The z-statistics for the change in SUZ12 levels were then 
calculated in the exact same way, i.e., comparing the log fold-change 
in SUZ12 of each REST target with those of the 100 nontarget regions 
with the nearest absolute level in SUZ12 from the same CpG class. 

RNA preparation and expression analysis 

Total RNA was prepared using TRIzol (Invitrogen). mRNA expres- 
sion data were generated using Mouse Gene 1.0 ST and Mouse 
Genome 430 2.0 arrays. Microarrays were RMA-normalized using 
R/Bioconductor (Gentleman et al. 2004) and the oligo package 
version 1.14.0 (Carvalho and Irizarry 2010). To determine tran- 
scriptional regulation of REST target genes in the RESTko we se- 
lected a twofold change as cutoff for significant up-regulation. 

Recombinase mediated cassette exchange (RMCE) 

One- to two-kilobase promoter fragments of REST targets and pre- 
dicted SNAIL targets were cloned and stably integrated into stem 



cells via RMCE as described (Feng et al. 1999; Lienert et al. 2011). 
AREST binding site mutants were generated by removing 15-20 bp 
of the REST consensus sequence. SNAIL mutant sites were created by 
changing the first and last nucleotide of the site to a T nucleotide. 
Primer sequences are available upon request. 

Data access 

All generated data sets are available for download at the NCBI Gene 
Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) 
under accession numbers GSE27148 and GSE27114. 
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